8 research outputs found
On the detection of SOurce COde re-use
© {Owner/Author | ACM} {2014}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in
FIRE '14 Proceedings of the Forum for Information Retrieval Evaluation, http://dx.doi.org/10.1145/2824864.2824878"This paper summarizes the goals, organization and results
of the first SOCO competitive evaluation campaign for systems
that automatically detect the source code re-use phenomenon.
The detection of source code re-use is an important
research field for both software industry and academia
fields. Accordingly, PAN@FIRE track, named SOurce COde
Re-use (SOCO) focused on the detection of re-used source
codes in C/C++ and Java programming languages. Participant
systems were asked to annotate several source codes
whether or not they represent cases of source code re-use.
In total five teams submitted 17 runs. The training set
consisted of annotations made by several experts, a feature
which turns the SOCO 2014 collection in a useful data set
for future evaluations and, at the same time, it establishes
a standard evaluation framework for future research works
on the posed shared task.PAN@FIRE (SOCO) has been organised in the framework of WIQ-EI (EC IRSES grantn. 269180) and DIANA-APPLICATIONS (TIN2012-38603-C02- 01) research projects. The work of the last author was supported by CONACyT Mexico Project Grant CB-2010/153315, and SEP-PROMEP UAM-PTC-380/48510349.Flores Sáez, E.; Rosso, P.; Moreno Boronat, LA.; Villatoro-Tello, E. (2014). On the detection of SOurce COde re-use. En FIRE '14 Proceedings of the Forum for Information Retrieval Evaluation. ACM. 21-30. https://doi.org/10.1145/2824864.2824878S2130C. Arwin and S. Tahaghoghi. Plagiarism detection across programming languages. Proceedings of the 29th Australian Computer Science Conference, Australian Computer Society, 48:277--286, 2006.N. Baer and R. Zeidman. Measuring whitespace pattern sequence as an indication of plagiarism. Journal of Software Engineering and Applications, 5(4):249--254, 2012.M. Chilowicz, E. Duris, and G. Roussel. Syntax tree fingerprinting for source code similarity detection. In Program Comprehension, 2009. ICPC '09. IEEE 17th International Conference on, pages 243--247, 2009.D. Chuda, P. Navrat, B. Kovacova, and P. Humay. The issue of (software) plagiarism: A student view. Education, IEEE Transactions on, 55(1):22--28, 2012.G. Cosma and M. Joy. Evaluating the performance of lsa for source-code plagiarism detection. Informatica, 36(4):409--424, 2013.B. Cui, J. Li, T. Guo, J. Wang, and D. Ma. Code comparison system based on abstract syntax tree. In Broadband Network and Multimedia Technology (IC-BNMT), 3rd IEEE International Conference on, pages 668--673, Oct 2010.J. A. W. Faidhi and S. K. Robinson. An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ., 11(1):11--19, Jan. 1987.Fire, editor. FIRE 2014 Working Notes. Sixth International Workshop of the Forum for Information Retrieval Evaluation, Bangalore, India, 5--7 December, 2014.J. L. Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.E. Flores, A. Barrón-Cedeño, L. Moreno, and P. Rosso. Uncovering source code reuse in large-scale academic environments. Computer Applications in Engineering Education, pages n/a--n/a, 2014.E. Flores, A. Barrón-Cedeño, P. Rosso, and L. Moreno. DeSoCoRe: Detecting source code re-use across programming languages. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session, NAACL-HLT, pages 1--4. Association for Computational Linguistics, 2012.E. Flores, A. Barrón-Cedeño, P. Rosso, and L. Moreno. Towards the Detection of Cross-Language Source Code Reuse. Proceedings of 16th International Conference on Applications of Natural Language to Information Systems, NLDB-2011, Springer-Verlag, LNCS(6716), pages 250--253, 2011.E. Flores, M. Ibarra-Romero, L. Moreno, G. Sidorov, and P. Rosso. Modelos de recuperación de información basados en n-gramas aplicados a la reutilización de código fuente. In Proc. 3rd Spanish Conf. on Information Retrieval, pages 185--188, 2014.D. Ganguly and G. J. Jones. Dcu@ fire-2014: an information retrieval approach for source code plagiarism detection. In Fire [8].R. García-Hernández and Y. Lendeneva. Identification of similar source codes based on longest common substrings. In Fire [8].M. Joy and M. Luck. Plagiarism in programming assignments. Education, IEEE Transactions on, 42(2):129--133, May 1999.A. Marcus, A. Sergeyev, V. Rajlich, and J. Maletic. An information retrieval approach to concept location in source code. In Reverse Engineering, 2004. Proceedings. 11th Working Conference on, pages 214--223, Nov 2004.S. Narayanan and S. Simi. Source code plagiarism detection and performance analysis using fingerprint based distance measure method. In Proc. of 7th International Conference on Computer Science Education, ICCSE '12, pages 1065--1068, July 2012.M. Potthast, M. Hagen, A. Beyer, M. Busse, M. Tippmann, P. Rosso, and B. Stein. Overview of the 6th international competition on plagiarism detection. In L. Cappellato, N. Ferro, M. Halvey, and W. Kraaij, editors, Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18, 2014., volume 1180 of CEUR Workshop Proceedings, pages 845--876. CEUR-WS.org, 2014.L. Prechelt, G. Malpohl, and M. Philippsen. Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science, 8(11):1016--1038, 2002.I. Rahal and C. Wielga. Source code plagiarism detection using biological string similarity algorithms. Journal of Information & Knowledge Management, 13(3), 2014.A. Ramírez-de-la Cruz, G. Ramírez-de-la Rosa, C. Sánchez-Sánchez, W. A. Luna-Ramírez, H. Jiménez-Salazar, and C. Rodríguez-Lucatero. Uam@soco 2014: Detection of source code reuse by means of combining different types of representations. In Fire [8].F. Rosales, A. García, S. Rodríguez, J. L. Pedraza, R. Méndez, and M. M. Nieto. Detection of plagiarism in programming assignments. IEEE Transactions on Education, 51(2):174--183, 2008.K. Sparck and C. van Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. British Library Research and Development Report, 5266, University of Cambridge, 1975.G. Whale. Software metrics and plagiarism detection. Journal of Systems and Software, 13(2):131--138, 1990
Towards the detection of cross-language source code reuse
Internet has made available huge amounts of information,
also source code. Source code repositories and, in general, programming
related websites, facilitate its reuse. In this work, we propose a simple
approach to the detection of cross-language source code reuse, a nearly
investigated problem. Our preliminary experiments, based on character
n-grams comparison, show that considering different sections of the
code (i.e., comments, code, reserved words, etc.), leads to different results.
When considering three programming languages: C++, Java, and
Python, the best result is obtained when comments are discarded and
the entire source code is considered.This work has been developed with the support of the project TEXT-ENTERPRISE 2.0: Text comprehension techniques applied to the needs of the Enterprise 2.0 (MICINN, Spain TIN2009-13391-C04-03 (PlanI+D+i)).Flores Sáez, E.; Barrón Cedeño, LA.; Rosso, P.; Moreno Boronat, LA. (2011). Towards the detection of cross-language source code reuse. En Natural Language Processing and Information Systems. Springer Verlag (Germany). 6716:250-253. https://doi.org/10.1007/978-3-642-22327-3_31S2502536716Arwin, C., Tahaghoghi, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)Faidhi, J., Robinson, S.: An empirical approach for detecting program similarity and plagiarism within a university programming environment. Comput. Educ. 11, 11–19 (1987)Jankowitz, H.T.: Detecting plagiarism in student pascal programs. The Computer Journal 31(1) (1988)Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. Journal of Algorithms 64(1), 51–60 (2009)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Languages Resources and Evaluation. Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Rosales, F., García, A., Rodríguez, S., Pedraza, J.L., Méndez, R., Nieto, M.M.: Detection of plagiarism in programming assignments. IEEE Transactions on Education 51(2), 174–183 (2008)Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. SEPLN 2009, Donostia, Spain, pp. 38–46 (2009
PAN@FIRE: Overview of SOCO Track on the Detection of SOurce COde Re-use
© Owner/Author This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM, In Proceedings of the Forum for Information Retrieval Evaluation. FIRE/ 14. http://dx.doi.org/10.1145/2824864.2824878This paper summarizes the goals, organization and results
of the first SOCO competitive evaluation campaign for systems that automatically
detect the source code re-use phenomenon. The detection of
source code re-use is an important research field for both software industry
and academia fields. Accordingly, PAN@FIRE task, named SOurce
COde Re-use (SOCO); focused on the detection of re-used source codes
in C/C++ and Java programming languages. Participant systems were
asked to annotate several source codes whether or not they represent
cases of source code re-use. In total three teams participated and submitted
13 runs. The training set consisted of annotations made by several
experts, a feature which turns the SOCO 2014 collection in a useful data
set for future evaluations and, at the same time, it establishes a standard
evaluation framework for future research works.PAN@FIRE (SOCO) has been organised in the framework of WIQ-EI (ECIRSES grant n. 269180) and DIANA-APPLICATIONS (TIN2012-38603-C02-01) research projects. The work of the last author was supported by CONACyT Mexico Project Grant CB-2010/153315, and SEP-PROMEP UAM-PTC-380/48510349.Flores Sáez, E.; Rosso, P.; Moreno Boronat, LA.; Villatoro-Tello, E. (2014). PAN@FIRE: Overview of SOCO Track on the Detection of SOurce COde Re-use. ACM. http://hdl.handle.net/10251/66414
Cross-language source code re-use detection using latent semantic analysis
[EN] Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional pproaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models
for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.This work was partially supported by Universitat Polit`ecnica de Val`encia,
WIQ-EI (IRSES grant n. 269180), and DIANA-APPLICATIONS (TIN2012-
38603-C02- 01) project. The work of the fourth author is also supported by
VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Flores Sáez, E.; Barrón-Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Cross-language source code re-use detection using latent semantic analysis. Journal of Universal Computer Science. 21(13):1708-1725. https://doi.org/10.3217/jucs-021-13-1708S17081725211
A low-power RF front-end for 2.5 GHz receivers
© 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper presents a low power and low cost front end for a direct conversion 2.5 GHz ISM band receiver composed of a 16 kV HBM ESD protected LNA, differential Gilbert-cell mixers, and high-pass filters for DC offset cancellation. The whole front-end is implemented in a 2P6M 0.18 µm RFCMOS process. It exhibits a voltage gain of 24dB and a SSB noise figure of 8.4dB which make it suitable for most of the 2.5 GHz wireless short-range communication transceivers. The achieved power consumption is only 1.06mW from a 1.2V power supply.Peer ReviewedPostprint (published version
IARG-AnCora: Annotating AnCora corpus with implicit arguments
[EN] Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in
AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling
based on machine learning techniques. Semantic analyzers are essential components in the
current applications of language technologies, in which it is important to obtain a deeper
understanding of the text to make inferences on the highest level in order to obtain qualitative
improvements in the results.[ES] IARG-AnCora tiene como objetivo la anotación con papeles temáticos de los
argumentos implícitos de las nominalizaciones deverbales en el corpus AnCora. Estos corpus
servirán de base para los sistemas de etiquetado automático de roles semánticos basados en
técnicas de aprendizaje automático. Los analizadores semánticos son componentes básicos en
las aplicaciones actuales de las tecnologías del lenguaje, en las que se quiere potenciar una
comprensión más profunda del texto para realizar inferencias de más alto nivel y obtener así
mejoras cualitativas en los resultados.Acción complementaria (FFI2011-13737-E), asociada al proyecto TextMess 2.0 (TIN2009-13391-C04-03/04).Taulé Delor, M.; Peris, A.; Martí Antonín, MA.; Moreno Boronat, LA.; Rodríguez, H.; Moreda, P. (2012). IARG-AnCora: Anotación de los corpus AnCora con argumentos implícitos. PROCESAMIENTO DEL LENGUAJE NATURAL. 49:181-184. http://hdl.handle.net/10251/29863S1811844
MALLBA: A library of skeletons for combinatorial optimisation
The mallba project tackles the resolution of combinatorial optimization problems using algorithmic skeletons implemented in C++. MALLBA offers three families of generic resolution methods: exact,heuristic and hybrid. Moreover, for each resolution method, MALLBA provides three different implementations: sequential, parallel for local area networks, and parallel for wide area networks (currently under development). This paper shows the architecture of the mallba library, presents some of its skeletons and offers several computational results to show the viability of the approach
MALLBA: a library of skeletons for combinatorial optimisation
The MALLBA project tackles the resolution of combinatorial
optimization
problems using algorithmic skeletons implemented in C++.
MALLBA offers three
families of generic resolution methods: exact, heuristic and
hybrid. Moreover,
for each resolution method, MALLBA provides three different
implementations:
sequential, parallel for local area networks, and parallel
for wide area
networks (currently under development). This paper shows
the architecture of
the MALLBA library, presents some of its skeletons and
offers several
computational results to show the viability of the approach